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f^**; , Abstract 

This work considers an additive noise channel where the time-fc noise variance is a 
weighted sum of the channel input powers prior to time k. This channel is motivated by 
point-to-point communication between two terminals that are embedded in the same chip. 
Transmission heats up the entire chip and hence increases the thermal noise at the receiver. 
The capacity of this channel (both with and without feedback) is studied at low transmit 
^\ ' powers and at high transmit powers. 

CN| , At low transmit powers, the slope of the capacity-vs-power curve at zero is computed 

and it is shown that the heating-up effect is beneficial. At high transmit powers, conditions 
are determined under which the capacity is bounded, i.e., under which the capacity does 
not grow to infinity as the allowed average power tends to infinity. | 
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1 Introduction 



Thermal heating in electronic systems is strongly related to performance limitation, aging, relia- 
• bility and safety issues. High performance-density and small physical size (area or volume) make 

00 I thermal heating important and challenging to address. This is enhanced by the trend of modern 

I/"") ■ (micro-)electronics technology to pack more and faster operations within the smallest possible 

physical area in order to increase performance, reduce cost and size, and therefore expand the 
■ potential applications of the product and make it more profitable. 

Electrical power dissipation into heat raises the local temperature of the circuit; more accu- 
rately, the temperature depends on the circuit activity. The temperature influences the power 
of the intrinsic noise in the circuit which in turn reduces the effective communication or com- 
putation capacity of the circuit. This "negative" performance feedback is expected to become a 
bottleneck of future technology [T] , [5] . 

This work aims to add this dimension to our understanding of the coupling mechanism 
between communication and computation performance and thermal heating. To this end a class 
of communication channels is introduced, where the channel's noise power depends dynamically 
on the channel's activity, and its channel capacity is studied. 

To support the previous statements and motivate the mathematical development of this new 
class of channels we first discuss the underlying physical mechanism that connects circuit activity 
with power consumption and thermal heating. Thermal heating is unavoidable in electronic 
circuits. Every circuit block converts part of the power it draws from the power supply network 
(and to certain extent from its interconnections with other blocks) into heat which raises the 
local temperature. 

A circuit block in a microchip occupies certain physical space within which heat is dis- 
tributively generated and diffused according to the heat diffusion equation (ignoring other heat 
sources) 

c hv ^ = v-f— vt)+e' (1) 

ot \Pthd / 



The material in this paper was presented in part at the 2007 IEEE International Symposium on Information 
Theory (ISIT), Nice, France, and at the 2007 IEEE Information Theory Workshop (ITW), Lake Tahoe, CA, 
USA. 
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where Chv is the volumetric heat capacity of the material, dT/dt is the change in temperature 
over time, V- is the divergence, pthd is the distributed thermal resistance, VT is the temperature 
gradient, and E' is the power density of the added heat, [3], [!]• 

In many cases the diffusion equation can be replaced by the corresponding ordinary differ- 
ential equation (ODE) that provides a lumped model of the thermal dynamics. Consider for 
example a microchip (die) , made out of material of lower thermal resistance, which is internally 
heated by the activity of circuits and transfers the heat to the environment (e.g., air) which has 
much higher resistance. In this case we can write 

at pth 

where Ch is the heat capacity of the microchip (die), pth is the thermal resistance between the 
die and the environment (e.g., air), T c is the temperature of the environment, and E is the 
instantaneous heat generated, i.e., the electrical power converted into heat by the circuit. 

Solving ([2]) with the assumption that at time t — we have T = T e with T e being fixed, we 
obtain 

T{t) = T c + ±- f e^E(£K, tel. (3) 

Jo 

If the circuit operates based on a reference clock of period r, ([3]) can be approximated by its 
discrete version 

fc-i 

T fc = r e + £^ e ~w fc ~%, keZ+, (4) 
1=1 Lh 

where Z + denotes the set of positive integers, and where the sequences {T k } and {E^} are the 
samples at integer multiples of r of T(-) and E(-), respectively. Equation shows the fading 
memory effect of temperature. Note that (j4j) also captures discrete versions of distributed or 
higher order lumped approximations of the diffusion equation |T]) . 

Every electronic circuit has some intrinsically generated noise. This noise is added to the 
received signal degrading its quality. Especially in the popular class of circuits based on MOS 
transistors [5] , this noise is dominated by a thermal noise component that is stationary Gaussian, 
and in most applications it can be considered white. The variance of the thermal noise N follows 
the Johnson-Nyquist formula 

N = ATW (5) 

where W is the considered bandwidth, T is the temperature of the receiver circuit block, and A 
is a proportionality constant [5], [B], [TJ. 

The transmission of information is typically associated with dissipation of energy into heat. 
Thus, in view of ([4]) and ([5]), this motivates a channel model where the variance 9 2 of the additive 
noise is determined by the history of the power of the transmitted signal, i.e., 

fc-i 

6 2 (x 1 ,...,x k _ 1 ) = a 2 + Y^OL k -ix\, keZ+, (6) 
t=i 

where xi is the transmitted symbol at time £ S Z + , and where a 2 and {at} will be defined in 
Section |1 

The rest of this paper is organized as follows. Section [2] describes the channel model in more 
detail. Section [3] discusses channel capacity and lists some important properties thereof. The 
main results are presented in Section [H The proofs of the results are given in Sections [5] and [6j 
Section [7] concludes with a summary. 
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Figure 1: A schema of the communication system. 

2 Channel Model 

We consider the communication system depicted in Figure [TJ The message M to be transmitted 
over the channel is assumed to be uniformly distributed over the set A4 = {1,...,|A4|} for some 
positive integer \M\. The encoder maps the message to the length- n sequence X\, . . . , X n , where 
n is the block-length. In the absence of feedback, the sequence X™ is a function of the message 
M, i.e., X™ — (f> n (M) for some mapping (f> n : M — > K™. Here ^4™ stands for A m , . . . , A n , 
and K denotes the set of real numbers. If there is a feedback link, then Xk, k = l,...,n 
is not only a function of the message M but also of the past channel output symbols 5^ , 
i.e., Xfc = ipff^(M, Y*~ l ) for some mapping ip^ : M x R fe_1 — > R. The receiver guesses the 
transmitted message M based on the n channel output symbols Y", i.e., M — ip n (Y™) for some 
mapping ip n : M" — ► M . 

Conditional on X\ = X\, . . . , Xk — Xk G R, the timc-fc channel output Yk € M is given by 



\ 



fc-i 



^a fc _^f -[/ fc , fceZ+, (7) 



where {Uk} is a zero- mean, unit-variance, stationary & weakly-mixing random process, drawn 
independently of M, and being of finite fourth moment and of finite differential entropy rate, 
i.e., 

E[[/!]<oo and h(U k \U^) > -oo. (8) 

See [5] for a definition of weak mixing. For example, {Uk} could be a stationary & ergodic 
Gaussian process [5]. In particular, the case of most interest is when {Uk} are independent 
and identically distributed (IID), zero-mean, unit- variance Gaussian random variables, and the 
reader is encouraged to focus on this case. 

The parameter a 2 is assumed to be positive. It accounts for the temperature of the device 
when the transmitter is silent. The coefficients ag, £ € Z + are nonnegative and bounded, i.e., 

o-i > 0, I S Z + and sup on < oo. (9) 

1^1+ 

They characterize the dissipation of the heat produced by the transmission of the message 

An example for a heat dissipation profile that satisfies © is the geometric heat dissipation 
profile where {a>e} is a geometric sequence, i.e., 

a e = p e , £e Z+ (10) 

for some < p < 1. 

The heat dissipation depends inter alia on the efficiency of the heat sink that is employed 
in order to absorb the produced heat. In the above example (jTOj) , the heat sink's efficiency is 
described by the parameter p: the smaller p, the more efficient the heat sink. In general, an 
efficient heat sink is modeled by a heat dissipation profile for which the sequence {ae} decays 
fast. 



1 lt seems reasonable to assume that the sequence {cti} is monotonically nonincreasing, i.e., ai < a^i for 
> £' . This assumption is, however, not required for the results stated in this paper. 
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We study the above channel under an average-power constraint on the inputs, i.e., the map- 
pings 4> n (without feedback) and ifn tfn (with feedback) are chosen such that — averaged 
over the message M and channel outputs Y™ — the sequence X™ satisfies 

n 

I^E[^ 2 ]<P, (11) 

k=l 

and we define the signal-to-noise ratio (SNR) as 

SNR 4 — . (12) 

Remark 1. The results presented in this paper do not change when is replaced by a per- 
message average-power constraint, i.e., when the mappings <f> n and (fin\...,ip^ are chosen 
such that, for each message m £ M. and for any given sequence of output symbols Y[ l — y™, the 
sequence x™ satisfies 

^E4<p. (13) 

k=l 

Indeed, all achievability results (which are based on schemes that ignore the feedback) are derived 
under (I13|) . whereas all converse results are derived under (lll[) . Since all mappings <j) n and 
ifin \ ■ ■ ■ , Pri^ that satisfy (|13[) also fulfill (|lip . this implies that the achievability results as well 
as the converse results derived in this paper hold irrespective of whether constraint (fTTj) or (fT3|) 
is imposed. 



3 Channel Capacity 

Let the rate R (in nats per channel use) be defined as 

R± l ^i, (14) 
n 

where log(-) denotes the natural logarithm function. A rate is said to be achievable if there 
exists a sequence of mappings {<j> n } (without feedback) or { [ip'n <Pn) } (with feedback) 

and {ipn} such that the error probability Pr(M ^ M) tends to zero as n goes to infinity. The 
capacity C is the supremum of all achievable rates. We denote by C(SNR) the capacity under 
the input constraint (lll| when there is no feedback, and we add the subscript "FB" to indicate 
that there is a feedback link. Clearly 

C(SNR) < Cfb(SNR) (15) 

as we can always ignore the feedback link. 

In the absence of feedback, the information capacity is defined as 

C Info (SNR)^ lim isup/CX?;^™), (16) 

n — >oc 

where the supremum is over all joint distributions on X\, . . . , X n satisfying pip . When there is 
a feedback link, then we define the information capacity as 

CWo, FB (SNR) 4 lim -sapI(M\Y?), (17) 

n — >oo n 

where the supremum is over all mappings ifn \ ■ ■ ■ , <£>n^ satisfying pip . By Fano's inequality 
[TOl Thm. 2.11.1] no rate above Ci n fo(SNR) and Ci„f 0j FB (SNR) is achievable, i.e., 

C(SNR) < Ci nfo (SNR) and C FB (SNR) < Ci nf0! FB(SNR). (18) 
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See [TT] for conditions that guarantee that Ci„f (SNR) is achievable. Note that the channel (0 
is not stationarjQ since the variance of the additive noise depends on the time-index k. It is 
therefore prima facie not clear whether the inequalities in (TT5|) hold with equality. 

In this paper, we shall investigate the capacities C(SNR) and Cfb(SNR) at low SNR and at 
high SNR. To study capacity at low SNR, we compute the capacities per unit cost defined as 

m 

■ A C(SNR) • . A Cfb(SNR) 

= sup CMP and c fb(0) = sup — — . (19) 

SNR>0 DINK SNR>0 DINK 

It will become apparent later that the suprema in (fT§]) are attained when SNR tends to zero. 
Note that (TT5|) implies 

C(fi) < Cfb(O). (20) 

At high SNR, we study conditions under which capacity is unbounded in the SNR. Notice 
that when the allowed transmit power is large, then there is a trade-off between optimizing the 
present transmission and minimizing the interference to future transmissions. Indeed, increasing 
the transmission power may help to overcome the present ambient noise, but it also heats up 
the chip and thus increases the noise variance in future receptions. Prima facie it is not clear 
that, as we increase the allowed transmit power, the capacity tends to infinity. We shall see that 
this is not necessarily the case. 



4 Main Results 

Our main results are presented in the following two sections. Section |4~T1 focuses on capacity at 
low SNR and presents our results on the capacity per unit cost. Section [4.21 provides a sufficient 
condition and a necessary condition on {ai} under which capacity is bounded in the SNR. 

4.1 Capacity per Unit Cost 

The results presented in this section hold under the additional assumptions that 



a? = a < oo (21) 



and that {U k } is IID. 



Proposition 1. Consider the above channel model, and assume additionally that the sequence 
{cti} satisfies (|21j) and that {Uk} is IID. Then 

Cmfo(SNR) C a=0 (SNR) 

SUP SNR " CATTD » ( 22 ) 

SNR>0 DIN xv SNR>0 DINK 

where C a =o (SNR) denotes the capacity of the channel 

Y k = x k + a ■ U k 
which is a special case of ([7]) for a = 0. 

Proof. See Appendix [X] □ 

This proposition demonstrates that the heating up can only increase the information capacity 
per unit cost. Thus at low SNR the heating effect is unharmful. 

For Gaussian noise, i.e., if {Uk} is a sequence of IID, zero-mean, unit-variance Gaussian 
random variables, then the heating effect is beneficial. 



2 By a stationary channel we mean a channel where for any stationary sequence of channel inputs {X^} and 
corresponding channel outputs {Y k } the pair {(X k ,Y k )} is jointly stationary. 
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Theorem 2. Consider the above channel model, and assume additionally that the sequence {ctg} 
satisfies (|21[) and that {Uk} is a sequence of IID, zero-mean, unit-variance Gaussian random 
variables. Then, irrespective of whether feedback is available or not, the corresponding capacity 
per unit cost is given by 



Cfb(O) = 6(0) 



Urn 



C(SNR) 




(23) 



SNRJ.0 SNR 

Proof. Sec Section El □ 

For example, for the geometric heat dissipation profile (jTUJ) we obtain from Theorem [5] 

1 1 



C FB (0)=C(0) = 



2 1 



P 



< p < 1. 



(24) 



Thus the capacity per unit cost is monotonically decreasing in p. 

The above result might be counterintuitive, because it suggests not to use heat sinks at low 
SNR. Nevertheless it can be heuristically explained by noting that the heating effect increases 
the channel gaiT^. Indeed, if we split up the channel output 



Yk = X k 



: + » ^ 2 + a k-t X ij ■ Uk 



into a data-dependent part 



Xi, — Xi 



\ 



a k-i X i 



1=1 



and a data- independent part Zk (with {Zk} being a sequence of IID, zero- mean, variance-cr 2 , 
Gaussian random variables drawn independently of {(Uk, Xk)}), then the channel gain G for 
is given by 



lim sup 



En 
k= 



Efe=i E i x k\ 



i 



oo 

e=i 



(25) 



where the supremum is over all joint distributions on X\, . . . , X n satisfying Thus, in view 
of (f2"5)) . Theorem [2] demonstrates that the capacity per unit cost is determined by the channel 
gain G. This result is not specific to ([7]) but has also been observed for other channel models. 
For example, the same is true for fading channels whenever the additive noise is Gaussian [13] . 

El- 



4.2 Conditions for Bounded Capacity 

While at low SNR the heating effect is beneficial, at high SNR it is detrimental. In fact, it 
turns out that capacity can be even bounded in the SNR, i.e., the capacity does not tend to 
infinity as the SNR tends to infinity. The following theorem provides a sufficient condition and a 
necessary condition on {ai} for the capacity to be bounded. Note that the results presented in 
this section do not require the additional assumptions made in Section |4. II we neither assume 
that the sequence {ai} satisfies (j2"Tj) nor that {Uk} is IID. 

Theorem 3. Consider the channel model described in Section® Then 

i) (ljin^±l>0) => (sup Cfb(SNR) < oo J (26) 

\l^oo Oil J \SNR>0 / 

ii) ( Urn = ] => ( sup C(SNR) = oo J , (27) 
V^ 00 a e J \snr>o / 

where we define, for any a > 0, a/0 = oo and 0/0 = 0. 

3 The channel gain is given by the ratio of the "desired" power at the channel output to the "desired" power 
at the channel input. 
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Proof. See Section [BJ □ 
For example, for a geometric heat dissipation (|10[) we have 

lim -^±i = p, < p < 1 

and it follows from Theorem [3] that the corresponding capacity is bounded. On the other hand, 
for a sub-geometric heat dissipation, i.e., 

a e = p'" , £ G Z+ 

for some < p < 1 and k > 1, we obtain 

Um __±± = Um p<'+i)"-'" = 

and Theorem [3] implies that the corresponding capacity is unbounded. Roughly speaking, we 
can say that whenever the sequence of coefficients {ctg} decays not faster than geometrically then 
capacity is bounded in the SNR, and whenever the sequence of coefficients {ae} decays faster 
than geometrically then capacity is unbounded in the SNR. 

Remark 2. For Part i) of Theorem^ the assumptions that the process {Uk} is weakly-mixing 
and that it has a finite fourth moment are not needed. These assumptions are only needed in the 
proof of Part ii)\^ In Part ii) of Theorem^ the condition on the left-hand side (LHS) of ()27|) 
can be replaced by 

lim 7 log— = oo. (28) 

£^oo l Oil 

This condition (|28[) is weaker than the original condition (|27p because 

lim e+1 = ] => ( lim — log — = c 

I— >oo ae J y£^oo £ ag 

When neither the LHS of J26| nor the LHS of ^27J) hold, i.e., 

W > and hm^±i = 0, (29) 

e^oo ae ae 

then capacity can be bounded or unbounded. Example [1] exhibits a sequence {ae} satisfying 
(f2"9"|) for which the capacity is bounded, and Example [3] provides a sequence {ae} satisfying (f2"9"| 
for which the capacity is unbounded^ 

Example 1. Consider the sequence {ae} where all coefficients with an even index are equal to \, 
and where all coefficients with an odd index are 0. It satisfies (|29|) because lim£_ +0 o ae+i/ae = oo 
and lim e >00 at+-\ I ae = 0. Then the time-k channel output Yk corresponding to the channel 
inputs (xx, . . . , Xk) is given by 



Y k = x k + 



\ 



L(fe-1)/2J 



1=1 



U k , k e Z+ 



where \_-\ denotes the floor function. Thus at even times the output Y^k, k G Z + only depends on 
the "even" inputs (X.2 1 X± 1 . . . ,X2k), while at odd times the output Yzk+i, k G only depends 
on the "odd" inputs (X± , X% , . . . , X2k+i ) ■ By proceeding along the lines of the proof of Part i) of 
Theorem[3\while choosing in (I60p f3 = l/yjj_ 2 ' ^ can ^ e shown that the capacity of this channel 
is bounded^} 

4 They are needed to prove Lemma [5] 

5 The provided sequences {cti } are not monotonically decreasing in I. Consequently, Examples[T]&[2]are rather 
of mathematical than of practical interest. Nevertheless they show that when neither condition of Theorem \3\ is 
satisfied, then one can construct simple examples yielding a bounded capacity or an unbounded capacity, thus 
demonstrating the difficulty of finding conditions that are necessary and sufficient for the capacity to be bounded. 

intuitively, with this choice of {ai} the channel can be divided into two parallel channels, one connecting 
the inputs and outputs at even times, and the other connecting the inputs and outputs at odd times. As both 
channels have the coefficients So = Si = . . . = 1, it follows from Theorem [3] that the capacity of each parallel 
channel is bounded and therefore also the capacity of the original channel. 
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Example 2. Consider the sequence {a{\ where all coefficients with an even positive index 
are 0, and where all other coefficients are 1. (Again, we have lini£^ 00 ait+i/oti = oo and 
\mi e yoc ai+i/ag = 0.) In this case the time-k channel output Y k corresponding to {x\, . . . , x k ) 
is given by 



Y k = x k 



1 



Lfe/2J 

i=i 



Using Gaussian inputs of power 2P at even times while setting the inputs to be zero at odd times, 
and measuring the channel outputs only at even times, reduces the channel to a memoryless 
additive noise channel and demonstrates (using the result of J15f ) the achievability of 

R= ilog(l + 2SNR) 

which is unbounded in the SNR. 

The two seemingly-similar examples thus lead to completely different capacity results. The 
crucial difference between Example [T] and Example [2] is that in the former example at even 
times the interference is caused by the past channel inputs at even times, whereas in the latter 
example at even times the interference is caused by the past channel inputs at odd times. Thus 
in Example [2] setting all "odd" inputs to zero cancels (at even times) the interference from past 
channel inputs and hence transforms the channel into an additive noise channel whose capacity 
is unbounded. Evidently, this approach does not work for Example [T] 



5 Proof of Theorem [2] 



In Section I5TT1 we derive an upper bound on the feedback capacity Cfb(SNR), and in Section HT21 
we derive a lower bound on the capacity C(SNR) in the absence of feedback. These bounds are 
used in Section [5. 31 to derive an upper bound on Cfb(0) and a lower bound on C(0), which are 
then both shown to be equal to 1/2 (1 + a). Together with (|2H)l this proves Theorem [5J 

5.1 Converse 

The upper bound on Cfb(SNR) is based on (|18p and on an upper bound on ±I(M; Y{ 1 ), which 
for our channel can be expressed, using the chain rule for mutual information, as 



i 1 

-7(M; Y") = - VY^n^- 1 ) - h{Y k \Y?-\Ai 
n n A — ' V 

fc=i 

1 n 

= ££(M y *l y * -1 ) - h(Y k \Y^-\M,Xl 



k=l 



1 " / 1 

"E [h(Y k \Y 1 k - 1 )-h(U k )--E 



k=l 



k-l 



log a 2 + J2 Ok-tXl 



(30) 



where the second equality follows because X\ is a function of M and 1^ ; and the last equality 
follows from the behavior of differential entropy under translation and scaling [TU1 Thms. 9.6.3 
& 9.6.4], and because U k is independent of (Y^~ l , M, ) . 

Evaluating the differential entropy h(Uk) of a Gaussian random variable, and using the trivial 
lower bound E 



log [a 2 + c^k-iX 2 ^ > log er 2 , we obtain the final upper bound 

V;^) <;Ef^l F H - ilog(2^ea 2 )) 
fc=l v 7 

k=l \ 1=1 ) 
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1 n k \ 

k=l £-1 / 
1 / i n n— fc \ 

= ^°g i + ~E E [^ 2 ]/- 2 EH 

V fc=l £=0 / 

<^og(l + (l + «)^EE[X, 2 ] 

< ~log(l + (l + a)SNR), (31) 

where we define ctQ = 1. Here the second inequality follows because conditioning cannot in- 
crease entropy and from the entropy maximizing property of Gaussian random variables [101 
Thm. 9.6.5]; the next inequality follows by Jensen's inequality; the following equality by rewrit- 
ing the double sum; the subsequent inequality follows because the coefficients are nonnegative 
which implies that ~^2" = q ae < Sfco a e = ^ + ct; and the last inequality follows from the power 
constraint (fTTj) . 

5.2 Direct Part 

As aforementioned, the above channel ||7J is not stationary and it is therefore prima facie not 
clear whether Ci n f (SNR) is achievable. We shall sidestep this problem by studying the capacity 
of a different channel whose time-fc channel output Yfc 6 R is, conditional on the sequence 
{ x k} = {%k}, given by 



\ 



fc-i 



ot h - t x\ • U k , k e Z+ (32) 



where {Uk} and {ag} are defined in Section [2] This channel has the advantage that it is 
stationary & ergodic in the sense that when {Xk} is a stationary & ergodic process then the pair 
{(Xk, Yk)} is jointly stationary & ergodic. It follows that if the sequences {Xk , k = 0, — 1, . . .} 
and {Xk , k = 1,2, . . .} are independent of each other, and if the random variables Xk, k — 
0, — 1, . . . are bounded, then any rate that can be achieved over this new channel is also achievable 
over the original channel. Indeed, the original channel (J7JI can be converted into (|32p by adding 



Sk = 



\ 



Y. <**-tXt ) ■ U- 



to the channel output Y^Q and, since the independence of {Xk , k = 0, — 1, . . .} and {Xk , k = 
1,2,.. .} ensures that the sequence {Sk , k £ Z + } is independent of the message M, it follows 
that any rate achievable over (f3"2")) can be achieved over ([7]) by using a receiver that generates 
{Sk , k £ Z+} and guesses then M based on {Y x + Si, . . . , Y n + S n )E 

We shall consider channel inputs {Xk} that are blockwise IID in blocks of L symbols (for 
some L £ Z+). Thus denoting X& = (XbL+i, ■ ■ ■ ,X^ b+ i) L ) T (where (-) T denotes the transpose), 
{X5} is a sequence of IID random length-L vectors with X& taking on the value (£, 0, . . . , 0) T 
with probability 8 and (0, . . . , 0) T with probability 1 — S, for some (el. Note that to satisfy 
the average-power constraint (fTTj) we shall choose £ and 5 so that 

£ 2 

s <5 = LSNR. (33) 



(T 2 



2 



7 Thc boundodness of the random variables Xk, k = 0, —1, . . . guarantees that the quantity Yle—— oo a k—l x 
is finite for any realization of {X^ , k = 0, —1, . . .}. 

8 Note that this approach is specific to the case where {U^} is a sequence of Gaussian random variables. Indeed, 
it relies heavily on the fact that given {X^} = {xt} th e additive noise term on the right-hand side of (l'>21 can 
be written as the sum of two independent random variables, of which one only depends on {Xk , k = 0, — f , . . .} 
and the other only on {X^ , k = 1, 2, . . .}. This surely holds for Gaussian random variables, but it does not 
necessarily hold for other distributions on {U^}. 
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Let Y b = (YbL+i, ■ ■ • >^Wi)z,) T ' Noting that the pair {(X b ,Y b )} is jointly stationary & 
crgodic, it follows from [TT] that the rate 



Hm -j(jd n/Li - 1 -M n/L1 - 1 ) 

n— >oo n \ J 



is achievable over the new channel (|32)) and thus yields a lower bound on the capacity C(SNR) 
of the original channel ([7|). We lower bound —l(x]p' L ^ 1 ; Yq X ) as 



-/ X, 



Ln/iJ-l. vL"/iJ-i> 

' 1 J 



1 

n 


Ln/iJ- 


1 


E 


/(XijY^J-ilxJ-i) 


b=0 




1 

> - 

n 


ln/L}- 


i 


E 


/(X^Y^" 1 ) 






1 

> - 

n 


Ln/iJ- 


l 


E 

b=0 


(/(X fc ; Y h XZ^,) -/(Xl^Yft Xg 



(34) 



where we use the chain rule and the nonnegativity of mutual information. It is shown in Ap- 
pendix [B] that 



lim I (Xl^Y^Xg =0. 

o— *oo 

This together with a Cesaro type theorem [TUJ Thm. 4.2.3] yields 



(35) 



Ln/iJ-l 



lim ^/(X^J- 1 ;^- 1 ) > i/(X ; Y |XlL) - ± hm * £ /(XlJ,; Y 6 |X$) 
n— »cc n L L n-+oo \ n/ L\ *■ — ' 



6=0 



^/(XosYoIX 



- 1 ) 

oo/ ' 



(36) 



where the first inequality follows by the stationarity of {(Xb,Yf,)} which implies that 
/(Xf,; Yf,|X^) does not depend on b, and by noting that lim^oo ^ = 1/L. 

We proceed to analyze /(X ; Yo|XZoo = xZ^) for a given sequence XZ^, = xZ^. Making 
use of the canonical decomposition of mutual information (e.g., [T2j Eq. (10)]), we have 



^(Xq; Y |X_^ — x_j^) — l(X 1 ; Y |X_ oo — x_j! 



DIP* , v -i 

1 Y |Xi=x,x_ 



dP Xl (x) 



P 



Y n |Xi=0,: 



£D(P 



Y |Xi=S,xI 

- '"'v. x . 



P 



Yo|A'i=0,x: 



P 



Y |Xi=0,x:^ 



(37) 



where the first equality follows because, for our choice of input distribution, X2 = . . . = Xl = 
and hence X\ conveys as much information about Yo as Xq. Here 1| • ) denotes relative 
entropy, i.e., 

d P 

log -TTT dPi if Pi < Po 



£>(Pi||Po) = 



dP 



otherwise, 



and 



• f Y |Jf 1 =e,xli o ' P Y |Xi=0,xI^' and ^Yolxl^ 

denote the distributions of Yo conditional on the inputs (X\ = £, XZqq = xZ^), (X\ = 
0, XZqq = xZqq), and on XZ^ = xl^, respectively. Thus Py |Xi=^x _1 ^ s ^ ne ^ aw °^ an ^~ 
variate Gaussian random vector of mean (£, 0, . . . , 0) T and of diagonal covariance matrix K _i 
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with diagonal entries 



K<0 (l,l)=a 2 + a_ 



i = 2, 



-^y |Xi=o x~ x * s ^ ne ^ aw °f an i-variate, zero- mean Gaussian random vector of diagonal covari- 
ance matrix K^°} 1 with diagonal entries 



K ^-i (h i) = o 3 + X] a -^L+i-lxj L+1 , i = l 



, . . . , L; 



and ^Y |x _1 ^ s gi ven by 

In order to evaluate the first term on the right-hand side (RHS) of (|3T[) we note that the 
relative entropy of two real, L-variate Gaussian random vectors of means and fj, 2 and of 
covariance matrices Ki and K 2 is given by 

D(A/"(mi, Ki) I) 7V(/i 2) K 2 )) = 1 logdet K 2 - ± logdet K x + ^ tr (KiK^ 1 - l L ) 



+ -A t 2 ) TK 2 1 (Mi -M2) 



(38) 



with det A and tr (A) denoting the determinant and the trace of the matrix A, and where 1^ 
denotes the L x L identity matrix. The second term on the RHS of (f3"T|) is analyzed in the next 
subsection. 



Let E 



DIP, 



YnlXI 



over X_L, i.< 



D 



( P Yo| 



r Y o |Xi=0,x; 



^Y o |Xi=0,X: 



denote the second term on the RHS of (f37|) averaged 



= c x -i 



DIP, 



Y |> 



R 



YolXj^O.xI 



Then using ()38|1 & (|37p and taking expectations over X_ 0O , we obtain, again defining ao = 1, 



-I Xo;Y X 



fill 

Ltr 2 2 



i=l 



L2^ 



i=2 



1 + Efc 1 -oo a -^+«-l X lL + l/ Cr2 

1 <^-i£ 2 



log 1 



I E K P v„ix: 



~ L er 2 2 1 

Z— 1 



a" 2 + Efi-00 , 
Y o |Xi=0,XI; 



P 



E^i-oo a -^+i-l E [ X lL+l] /°" 2 



(5 1 
£2 



^log(l + ai _i^/^ 2 ) 



i=2 



4 E K 

1 L 
> -snrV - 
- 2 ^ 1 



p y |x:i 



P 



y |x 1= o,x; 



Q!i-1 



a L SNR 



1 J^log(l + ai_i^ 2 /t7^ 
-SNRV 1 

2 ^ 

j=2 



e 2 /<r 2 



p y |x: 



y |x 1= o,x; 



(39) 
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where the first inequality follows by the lower bound E [1/(1 + X)] > 1/(1 + E[X]), which is a 
consequence of Jensen's inequality applied to the convex function 1/(1 + x), x > 0, and by the 
upper bound 



log 1 



< log(l + a l _i^ 2 /^ 2 



0-2 + E f =-oc at-tL+i-iX$ L+lj 
and the second inequality follows by (f33|) and by upper bounding 

— 1 oo 



^ ce-tL+i-i < a e — a, i = 1, 



i = 2,...,L; 



The final lower bound follows now by p9|) and ([36 



lim -/(X 



Ln/LJ-l. ^-ln/H-1 
n i - 1 n 



> 



IsnrV- 

2 ^ 1 



Q!i-1 



a L SNR 



1 ^— v log (1 + Qtf_if /a 1 

- -snrV 

2 ^ 



i=2 

\D[R 



Ynix; 



"y |Xi=o,x: 



and by recalling that 



C(SNR)> lim -/(X^- 1 ;*^- 1 ). 

n— >oo 71 



(40) 
(41) 



5.3 Asymptotic Analysis 

We start with analyzing the upper bound (f3"Tj) . Using that log(l + x) < x, x > — 1 we have 
Cfb(SNR) ^ \ log(l + (1 + a) SNR) ^ 1 



SNR 



< 



SNR 



and we thus obtain 



• Cfb(SNR) .1,,, v 
C FB (0)= sup — — <_(i + a). 

SNR>0 OJNK ^ 



(42) 
(43) 

(44) 



and proceed by analyzing the limiting ratio of the lower bound (|40|) to SNR as SNR tends to 
zero. To this end we first shall show that 



In order to derive a lower bound on C(0) we first note that 

• ,„ s C(SNR) ,. C(SNR) 
C(0 = sup -4 '- > lim -4 '- 

snr>o SNR - snrio SNR 



lim 

SNRiO 



E 




P Y |X 1= 0,X:^y 




S 


NR 



= 0. 



We recall that for any pair of distributions Pq and Pi satisfying Pi <C Po [T2l p. 1023] 

lim ^(^i + (l-/3)Pol|Po) _ Q 

Thus, for any given Xl^ = xl^,, (|4*5|) together with 8 — SNR L a 2 /£ 2 implies that 



(45) 



(46) 



lim 

SNRJ.0 



D ( p Vo|x = L 


^YolX^O.xI^J 




5NR 



= 0. 



(47) 
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In order to show that this also holds when D { P^ , 

\ Y o|x 

we derive in the following the uniform upper bound 



sup DP 



P, 



Y |X 1= 0,x:i 



DP, 



Ynlxl 



Yo|X 1= 0,x:^ 



P- 



is averaged over X_ 0O , 
(48) 



The claim (I45[) follows then by upper bounding 



^Y |x: 



P 



Y o |J>fi=0 : X: 



< DIP 



Yo|xI 



P, 



Y o |X,=0,x" 



.=0 



,=0 



(49) 



and by (|47|) . 

In order to prove l)48p we use that any Gaussian random vector can be expressed as the sum 
of two independent Gaussian random vectors to write the channel output Yq as 



where, conditional on X' 



Y = X + V + W, (50) 
V and W are L-variate, zero-mean Gaussian random vec- 



tors, drawn independently of each other and having the respective diagonal covariance matrices 
Kv|x an d ^wix" 1 wri0se diagonal entries are given by 

K V | X0 (1,1)= < 7 2 

K v|x (M) = o- 2 + a»-ix?, i = 2,...,L, 



and 



K w|x:^(*^) = Yj a - 



1, 



L. 



Thus W is the portion of the noise due to XZ^, and V is the portion of the noise that remains 
after subtracting W. Note that Xo + V and W are independent of each other because Xo is, 
by construction, independent of XZ^- The upper bound (|48|) follows now by 



^I^Yoix: 



^Y |Xi=0,x: 



= DP 



X +V+W|xl 



P 



X o +V+W|Jfi=0,x: 



< -D(Px +v||Px +V|Xi=o) 



D 



( P Yo| 



P~ 

^Y |Xi=0,xI 



=0 



(51) 



where 



^Xo+V+Wlxl^ and ^Xo+V+W|Xi=0,xI^ 

denote the distributions of Xo + V + W conditional on the inputs XZqq = xZ^ and on (X\ = 
OjXZtc = xZqq), respectively; Px +v denotes the unconditional distribution of Xo + V; and 
Px +v|Xi=o denotes the distribution of Xo + V conditional on X\ — 0. Here the inequality 
follows by the data processing inequality for relative entropy (see [TU1 Sec. 2.9]) and by noting 
that Xo + V is independent of XZ^- 

Returning to the analysis of (|4H| . we obtain from (|44[) and (|4"S")) 

C(SNR) 



C(0) > lim 



SNR10 SNR 
L 



> lim 

SNR 



im — — 

mio 2^1 

i—l 



a L SNR 



lA log (l + a^/a 2 ) 

9 2^ 



i=2 



(52) 



By letting first £ 2 go to infinity while holding L fixed, and by letting then L go to infinity, we 
obtain the desired lower bound on the capacity per unit cost 



v ' ~ SNRio SNR - 2 v ' 



(53) 
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Thus (53}, (20} , and g3} yield 



kl + a)< lim < 6(0) < C F b(0) < ^(1 + a) (54) 

2 snrj.0 SNR 2 



which proves Theorem [2] 



6 Proof of Theorem [3] 

6.1 Part i) 

In order to show that 



lim^±i>0 

£^oo ate 



(55) 



implies that the feedback capacity Cfb(SNR) is bounded, we derive a capacity upper bound 
which is based on (TTS"} and on an upper bound on —I(M; Y™). Again we define ao = 1. 
We first note that, according to ([55} . we can find an £q G Z + and a < p < 1 so that 



«£ > and 



at 



>P, i>io. 



We continue with the chain rule for mutual information 



(56) 



—I(M;Y") = —2_]l(M;Y k \Y{ 



k=l 



E 

k=e +i 



I{M;Y k \Y* 



k-l\ 



Each summand in the first sum on the RHS of ([57} is upper bounded by 



/(M^jy*- 1 ) < h(Y k ) - h(Y k \Y*-\M) 



h{Y k ) 



fc-i 



< - log 2ne 1 



< - log 2ire 1 



log cr 2 + Vk-lXj 

E[X 2 ] 



E 



sup a£/ 



e=i 



< - log ( 2ne ( 1 + ( sup a//) n SNR 
2 V V *'eZ + 

< i log ( 27re ( 1 + ( sup n SNR 

2 V V t'&t 




(57) 



(58) 



Recall that sup^, gZ + ae> is finite ([9}. Here the first inequality follows because conditioning 
cannot increase entropy; the following equality follows because (X^, l/* - 1 ) is a function of 
(M, ), from the behavior of entropy under translation and scaling [TOJ Thms. 9.6.3 & 9.6.4], 
and from the fact that, conditional on U^ 1 , U k is independent of (Xf , M, Y{ ) ; the subsequent 
inequality follows from the entropy maximizing property of Gaussian random variables and by 

lower bounding E log (a 2 + Yli=i a k-&xf\ > loger 2 ; the next inequality by upper bounding 
each coefficient ae < sup £ , eZ + ae>, £ = 1, . . . , k; the subsequent inequality follows from the power 
constraint (fTT} ; and the last inequality follows because conditioning cannot increase entropy. 

The summands in the second sum on the RHS of ([57} are upper bounded using the general 
upper bound for mutual information 16, Thm. 5.1] 



I(X;Y) < / D(W(-\x)\\R(-))dQ(x), 



(59) 
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where W(-|-) is the channel law, Q(-) is the distribution on the channel input X, and R(-) is 
any distribution on the output alphabet. Thus any choice of output distribution yields an 
upper bound on the mutual information. 

We upper bound l{M;Y k \Y£- x = y^ 1 ), k = £ + 1, . . • , n for a given Y^ 1 = y*' 1 by 
choosing R(-) to be a Cauchy distribution whose density is given by 



where we choose the scale parameter (3 to b^HI 



(60) 



and f3 = min < p e ° 1 



max ay 
'=o,...A-i 



■,oie ,p 



with < p < 1 and £ e ^ + given by ([56]). Note that ([56]) together with © implies that 

< [3 < 1 and j3ae<ae+e , I G Zq~. 
Applying ffl to (J59J yields 



(61) 



(62) 



/(MjFfely^-^yf- 1 ) < E 



log 1 



y2 



Y?- X =yl x 



P Y U J 

+ log7r-h(Y k \M,Y 1 k - 1 =y k r 1 



(63) 



and we thus obtain, averaging over Y, 1 , 



1. 



I(M; Y^Y*- 1 ) < \ogir - h{Y k \Y*-\M) + [log (f3Y 2 _ eo ) 

+ E [log (M-i + Y 2 )] - E [log (F fc 2 _J] - log/3. 
We evaluate the terms on the RHS of ([51]) individually. We begin with 

k-l s 



h{Y k \Yt\M) > 



log o- 2 + J! 



k-l\ 
oo J ' 



(64) 



(65) 



where we use the same steps as in the equality in (|58[) and that conditioning cannot increase 
entropy. The next term is upper bounded by 



\og(pY k 2 _ e<) ) 



= E 
< E 



E 

log 



log (j3(X k _ la + 6(X1-^- 1 ) ■ U k - la f 
Xk-e„ +6(X* la x ) ■ U k -£ a ) 

k-la-l 

1=1 



< E 



log (^X 2 k _ lo + (3a 2 + (3 J2 a k 

/ k-t 

log la 2 + X a fc _^X, 



fc-A 
1 

k-l 



j£k—l 



(66) 



where we define, for a given X{ 



k-l fc-1 



„fc-l\ A 



\ 



k-l 



- 2 + E 



a k -£X e . 



(67) 



e=i 



"When yk-e = then with this choice of /3 the density of the Cauchy distribution (1606 is undefined. However, 
this event is of zero probability and has therefore no impact on the mutual information l(M; Y k | 5^ — ■ 
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Here the first inequality in (|66[) follows from Jensen's inequality, and the second inequality follows 
from (|62p . Similarly we use Jensen's inequality along with (|62[) to upper bound 



log (fSY^+Y 2 ) 



< E 









k-i 



< log 2 + E 



(68) 



log \o 2 + a k-e x t) 
In order to lower bound E[log (Y 2 _ eo )] we need the following lemma: 

Lemma 4. Let X be a random variable of density fx{ x )> x S K. Then, for any < 5 < 1 and 
< rj < 1 u;e have 



sup E [log | Jf + c]- 1 • I { IX + c\ < 6}] < e(S, n) + -h~ (X) 

cGK V 



where !{■} denotes the indicator functio^\: h (X) is defined 



as 



l{xeR:f x (x)>l} 

and where e(<5, 77) > tends to zero as 5 10. 
Proof. See [16[ Lemma 6.7]. 
We write the expectation as 



fx(x) log fx (x) dx; 



(69) 



(70) 



□ 



E[iog(n 2 . 



io g (x k - to +e(xi°- to - 1 )-u k -e o y 



X 



k-t B 



and lower bound the conditional expectation for a given X^ — t0 by 



Js-io 



log (x k - io +0(X*- t °- 1 )-U k - to y 



X 



k-io _ „k-lo 



\o g 9 2 (xl- ia - 1 ) - 2E 



log 



X k-l . TT 



x 



> loge 2 ^ - 1 ) - 2e(S,n) - -/T^J + log<5 2 

rj 



(71) 



for some < 5 < 1 and < 77 < 1. Here the inequality follows by splitting the conditional 
expectation into the two expectations 



log 



e{x1 



X k-£ , TJ 



xi 



k-t0 _ rr.k-10 



x. 



log 



eix*-**- 1 ) k - i0 



X k-l a TT 

(xf-*- 1 ) + 



<<5 



^1 — x i 



log 



X k-t a . j T 
fc-lo-l\ + U k-to 



X 



■I 



> s 



vk-l _ ^k-lc, 
— d,j 



and by upper bounding then the first term on the RHS using Lemma U and the second term by 
— log S. Averaging (f7Tj) over X\~ ta yields 



E[log(y fe % )] >e 



k-in-\ 



iogU 2 + 



a k -£ -eX e 



2e{5,r l )--h-{U k - ta )+\og5 2 . (72) 
77 



10 The indicator function I {statement} takes on the value 1 if the statement is true and otherwise. 
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Note that, since Uf.-i is of unit variance, ([5]) together with [THl Lemma 6.4] implies that 
h~(Uk-i Q ) is finite. 

Turning back to the upper bound (|6"4"|) we obtain from (|6"5"|). (|66[) . (f6"5| . and (|72p 



fc-i> 



l(M;Y k \Y{ 



< log 7T E 

_ 5 2 



fc-1 



log cr 2 + ^ a k -tX 
log ( o- 2 + ^ a fc _/A"| 



oo J 



log 2 + E 



log cr 2 + XI a *-^ X l 



log (T 2 + X «fc-/o-<-X| 



+ 2e(5,r?) + -h-{U k . lo ) - log<S 2 - log/3 



< E 



log cr 2 + a k-iXj 



k-ta-\ 



log [a 2 + J2 <*k-lo-tXi 



K. 



(73) 



where 



2tt 2 

K 4 log _ ^(C/fel^- 1 ) + -/l-(C/ fc -^ ) + 26(5,7?) 



(74) 



is a finite constant, and where the last inequality in (J73J) follows because for any Xj*_j g+1 



Xk-io+i we nave Yle=i° a fc-f £ 2 < E^=i a k-£ x 2 . Note that K does not depend on k as the 
process {Uk} is stationary. 

Turning back to the evaluation of the second sum on the RHS of (f57|) . we use that for any 
sequences {a k } and {b k } 



n-2l 



fc=f + l fe=n-2^ +l fe=^o+l 



Defining 



and 



A r- 

afc = E 



log a 2 + X "fc-£^, 



fe = E 



log [cr 2 + X a k-to-t X i 



we have for the first sum on the RHS of ([75)1 

n n 
fc=n-2£ + l fc=n-2£ + l 



«=1 



log 



(75) 



(76) 



(77) 



cr 2 + Y^l=l a k-lX'j 



9 1 v^^ — n+2^Q — 1 -r^o 
0" z + 2w£=l Q:fe_ n+ 2£ -£A f 



< 2£ log 1 + ( sup a t ) n SNR 



(78) 



which follows by lower bounding the denominator by cr 2 , and by using then Jensen's inequality 
together with the third and fourth inequality in (|58|) . For the second sum on the RHS of (|75| 
we have 



n-2l n-2t Q 

X (ak-b k +2e )= X E 

k=i + l k=i + l 

n~2t Q 

± £ e 

fc=£ + l 



log 
log 



ct2 + E*=i "fe-^l 



„-2 1 -r^k+eo-l 

2 1 v^^' v2 



2 1 v^K+to — J- v"2 



(n-34)log/3 



< -(n-3£ )log/3, 



(79) 
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where the first inequality follows by adding log/3 to the expectation and by upper bounding 
then f3ae < ai + £ , £ £ (|()2"j) ; and the last inequality follows because for any given -X^.^ 0-1 = 

x k+1 ° we have a fc+£o _ £ x| < a fc+£o _ £ a;|. 

We apply now ([75]). (1751) , CU), and |(7j| to upper bound 

i V /(MjyJF^ 1 ) <^^K + ^log( l + (supa,)nSNR) - ! ^ £ log/3 (80) 

which together with (|57|) and (|58p yields 

I/(M;y«) < - log/3 + £ lo g (2ne) - ^h(U k \U^) 

n n n 2n n ' 

sup a t ) n SNR ] . (81) 

This converges to K — log/3 < oo as we let n tend to infinity, thus proving that lim ^^ a^+i/a^ > 
implies that the capacity Cfb(SNR) is bounded in the SNR. 

6.2 Part ii) 

We shall show that 

1 1 

lim - log — = oo (82) 

t^oo £ ag 

implies that the capacity C(SNR) in the absence of feedback is unbounded in the SNR. Part ii) 
of Theorem [3] follows then by noting that 

lim 1+1 = lim — log — — oo. (83) 

We prove the claim by proposing a coding scheme that achieves an unbounded rate. We first 
note that ([82)) implies that for any < q < 1 we can find an £q E Z + so that 



ai<Q% £ = £ ,£>£ . (84) 
If there exists an £ <= Z+ so that ai — 0, £ > £o, then we can achieve the (unbounded) rate 

R= ^-log(l + LSNR), L>£ (85) 

by a coding scheme where the channel inputs {XkL+i , k £ Z^} are IID, zero-mean Gaussian 
random variables of variance LP, and where the other inputs are deterministically zero. Indeed, 
by waiting L time-steps, the chip's temperature cools down to the ambient one so that the noise 
variance is independent of the previous channel inputs and we can achieve — after appropriate 
normalization — the capacity of the additive white Gaussian noise (AWGN) channel [15] . 

For the more general case (|84)) we propose the following encoding and decoding scheme. 
Let Xi(m), m £ M. denote the codeword sent out by the transmitter that corresponds to the 
message M — m. We choose some L > £ and generate the components XkL+i{m), m £ M, 
k = 0, . . . , \ n/L\ — 1 independently of each other according to a zero- mean Gaussian law of 
variance P. The other components are set to zerol^l 

The receiver uses a nearest neighbor decoder in order to guess M based on the received 
sequence of channel outputs yf. Thus it computes ||y — x(m')|| 2 for each ml £ M. and decides 
on the message that satisfies 

M = arg min ||y - x(m')|| 2 , (86) 



n It follows from the weak law of large numbers that, for any m £ M, i Sfc=l x \{ m ) converges to P/L in 
probability as n tends to infinity. This guarantees that the probability that a codeword does not satisfy the 
per-message power constraint (1 1 3 I t — and hence also the average-power constraint Hilt — vanishes as n tends to 
infinity. 
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where ties are resolved with a fair coin flip. Here, || • || denotes the Euclidean 
norm, and y and x(m') denote the respective vectors (t/i, i/l+i, ■ ■ ■ , U(in/ l\-i)l+iY an d 
(x 1 (m'),x L+1 (m') 1 . . • ,x (L „ /L j_ 1)L+1 (m')) T . 

We are interested in the average probability of error Pr(M ^ MJ, averaged over all code- 
words in the codebook, and averaged over all codebooks. By the symmetry of the codebook 
construction, the probability of error corresponding to the m-th message Pr(M ^ M | M = m) 
does not depend on m, and we thus conclude that Pr(M ± M) = Pr(M 7^ M | M = l). We 
further note that 



/ \M\ 

Pr(M ^ M I M = 1) < Pr (J ||Y - X(m')l| 2 < ll z ll 

Ym'=2 



M = 1 



(87) 



where 



z=(e(x 1 {i))-u 1 ,9(xt(i))-u L+x ,...,o{x[^^ 

which is, conditional on M = 1, equal to ||Y — X(l)|| 2 . In order to analyze ([57| we need the 
following lemma. 

Lemma 5. Consider the channel described in Section^ and assume that {ai} satisfies (|82[) . 



Furth 



er assume 



that {XkL+i j £ } is a sequence of IID, 



zero-mean 



Gaussian random 



ables of variance P, and that Xk — if k mod L 1 ( where k mod L stands for the remainder 
upon diving k by L). Let the set T> e be defined as 



5 e = Me 



,[n/L\ 



1 



1 



- (a 2 + P + P) 



< e, 



[n/L\ 



||z|| 2 - + a W P) 



< e 



wii/i a' 1 "' feifij defined as 
Then 

for any e > 0. 

Proof. See Appendix [Ul 



lim Pr((Y,Z) 6H e )=l 



(89) 
(90) 
□ 



In order to upper bound the RHS of (|57|) we proceed along the lines of [15] , [14] . We have 

' M = 1 



■ (J ||Y-X(m')|| 2 <||Z| 

\m'=2 

<Pr((Y,Z)^P e )+ / Pr f U ||y-X(m')|| 2 <||z| 

J ^ \m'=2 



(y,z),M = l dP(y,z), (91) 



where we use that, by the symmetry of the codebook construction, the law of (Y, Z) does not 
depend on M. It follows from Lemma [5] that the first term on the RHS of (|91[) vanishes as n 
tends to infinity. Since the codewords are independent of each other, conditional on M = 1, the 
distribution of X(m'), m' = 2, . . . , \ A4\ does not depend on (y, z). We upper bound the second 
term on the RHS of (02} by analyzing Pr(||y-X(m')|| 2 < ||z|| 2 | (y,z),M = l), ml = 2, . . . , \M\ 
and by applying then the union of events bound. 
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For to' = 2, . . . , \M\, we have 
Pr(||y-X(m')|| 2 <||z|| 2 |(y,z)) 

< exp | - a \n/L\ (a 2 + a W P + e) + - i L^/^J Iog(l - 2sP) | , (y, z) G 2? £ (92) 

for any s < 0. This follows by upper bounding ||z|| 2 by [n/L\ (a 2 +a^ P+e) and from Chernoff's 
bound [HI Sec. 5.4]. Using that, for (y,z) e D e , 

||y|| 2 > [n/L\ (a 2 + P + a (L) P - e) 

it follows from the union of events bound and from (|92|) that (|9lj) goes to zero as n tends to 
infinity if for some s < the rate R satisfies 

n < + P + ., + _L log(1 _ 2sP) _ ^ + p + _a<*»p- e (93) 

Thus choosing s = —1/2 • 1/(1 + a^ L ' P) yields that any rate below 

1 a 1 + a (i > P + e 1 , / P 
— log 1 



2L l + a^P 2L 6 V l + a( L ^P 

1 a 2 + P + P - e 1 
h 2L l + a( L ) P T 



(94) 



i +Q (i) p 

is achievable. As P tends to infinity this converges to 



2T 1 ° g ( 1 + ^)) > 2T l0g ci)- (95) 



It remains to show that given ([84)) we can make — loga^ arbitrarily large. Indeed, 
implies that 

OO OG J j 



1 - Q L 

=1 1=1 K 



and ([95]) can therefore be further lower bounded by 



J-log(l-^)+ilogi. (96) 
2L 2 g 

Letting L tend to infinity yields then that we can achieve any rate below | log - . As this can be 
made arbitrarily large by choosing g sufficiently small, we conclude that lim^oo j log — oo 
implies that the capacity is unbounded. 



7 Conclusion 

We studied a model for on-chip communication with nonideal heat sinks. To account for the 
heating up effect we proposed a channel model where the variance of the additive noise depends 
on a weighted sum of the past channel input powers. The weights characterize the efficiency of 
the heat sink. 

To study the capacity of this channel at low SNR, we computed the capacity per unit cost. 
We showed that the heating effect is not just unharmful but can be even beneficial in the sense 
that the capacity per unit cost can be larger than the capacity per unit cost of a corresponding 
channel with ideal heat sink, i.e., where the weights describing the dependency of the noise 
variance on the channel input powers are zero. This suggests that at low SNR no heat sinks 
should be used. 

Studying capacity at high SNR, we derived a sufficient condition and a necessary condition 
on the weights for the capacity to be bounded in the SNR. We showed that when the sequence of 
weights decays not faster than geometrically, then capacity is bounded in the SNR. On the other 
hand, if the sequence of weights decays faster than geometrically, then capacity is unbounded in 
the SNR. This result demonstrates the importance of an efficient heat sink at high SNR. 
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A Proof of Proposition [T] 

We first note that by the expression of the capacity per unit cost of a memoryless channel [12] 
we have 

C a=0 (SNR) D(W a=o (-\Q\\W a=o (-\0)) , ~ 

sup = sup 72~T2 ' 97 

snr>o SNR C 2 >0 C l a 

where W a= o(-|-) denotes the channel law of the channel 

Y k =x k +a-Uk. (98) 
Thus to prove Proposition [T] it suffices to show that 

Cmfo(SNR) D(W a=o (-\O\\W a=o (-\0)) 

SUP > SUp — * — ^ '-. 

SNR>0 oNR C 2 >0 C 1° 

We shall obtain this result by deriving a lower bound on Ci n fo(SNR) and by computing then its 
limiting ratio to SNR as SNR tends to zero. 

In order to lower bound Ci n f (SNR), which was defined in (fT6|) as 

Ci nfo (SNR)= lim isup/^yj*), 

n — >oo 

we evaluate —I(X™; F") for inputs {X^} that are blockwise IID in blocks of L symbols (for some 
L € Z + ). Thus {(XbL+i, . . . , G Zq } is a sequence of IID random length- Z, vectors 

with (XbL+i, ■ ■ ■ , X^+i)l) taking on the value (£, 0, . . . , 0) with probability 6 and (0, . . . , 0) with 
probability 1 — 5, for some (el. To satisfy the power constraint (fTTj) we shall choose £ and 6 
such that 

£ 2 

^S = L SNR. (99) 
<y 

We use the chain rule for mutual information to write 

. 1 [n/Lj-l 

-I(X^;Y 1 n ) = - V l(X bL+1 ;Y 1 n \X b 1 L ) 
6=0 
Ln/LJ-l 

>- V J(X 6£+ i;n £+1 |X 1 6£ ), (100) 

where the inequality follows because reducing observations cannot increase mutual information. 

Let -Ron-off( snr ) denote the maximum rate achievable on ([98]) using on-off keying with on- 
symbol £ and with its corresponding probability p chosen in order to satisfy the power constraint 
snr, i.e., 

#o£off( snr )= SU P I{X;X + a-U k ), snr>0. (101) 

Fx(S)=l-Fx(0)=p, 

Notice that i?^_ off (snr), snr > is a nonnegative, monotonically nondecreasing function of snr 
with -Ron- o ff(0) = 0- From the strict concavity of mutual information it follows that -Ron-off( snr ) > 
whenever snr > 0. Also, for a fixed £, snr i— > R < £!-os( snr ) ^ s concave in snr. Consequently, for 
some snro > 0, the function snr i— > icl_ fr(snr) is strictly monotonic in the interval snr £ [0,snro], 
and hence the supremum on the RHS of (| 1 1 1) is attained for p = snrcr 2 /£ 2 , snr € [0,snro]. 
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By writing I {X bL+1 -Y bL+1 \X\ L = x\ L ) for a given X\ L = xf as 

l{X hL+l ;Y bL+1 \X\ L = xf) = l(X bL+1 :X bL+1 +9(x b 1 L )-U bL+1 ) 



a 



— I yXbL + l, bL ^ X b L + l + er • UbL+1 j 

(with 6(x\ L ) defined in (|67[) ) . and by using that for snr S [0,snro] the supremum on the RHS of 
()101jl is attained for p = snr <r 2 /£ 2 we obtain 



l{X bL+1 ;Y bL+1 \x\ L =x\ L ) =R®4 S [ _ 6 _ 1 LSNR 2 — ], SNR G [0, SNRq], (102) 

1 + 22e=o a lb-e)Lxi L+1 /(T 2 



where SNRo = snro/L. Averaging over X\ L and combining with () 1 00(1 yields 

Ln/Lj-l 



n n ^ — ' 



n 

b=0 



R (0 

■"-on-off 



L SNR 



, 1 + EfcO a (b-£)LX$ L+1 /<T 2 



^^-(iTlSfv)' SNR € [0, SNRo], (103, 

where the second inequality follows by upper bounding Efco a (b-i)LX 2 L+l / a 2 < 

Efci a eL^ 2 /<^ 2 , and by using that snr i— > ^on-off( snr ) ^ s monotonically increasing in snr. The 
lower bound on Ci n f (SNR) follows then by letting n tend to infinity 

( L SNR 



C Info (SNR) = lim -J(X»; Y?) > -R^_ oS — ™ — T . (104) 



With this we can lower bound the information capacity per unit cost as 

MSM) > Hm Ci nfo (SNR) 
snr>o SNR ~ SNRiO SNR 

p(£) / L SNR 

1 "on-off I a«t« 2 /^ 2 

> hm — 



snrio L SNR 

/ L SNR. ^ 

hm 



lim 



4£ off (SNR') 



snr' 10 SNR' 1 + Efci"^ 2 /^ 2 ' 



(105) 



where the first inequality follows by lower bounding the supremum by the limit; and where the 
last equality follows by substituting SNR' = S a^^ h' 1 ' 

Proceeding along the lines of the proof of [TH Thm. 3] , it can be shown that 

Um g^jgNRQ = D(W a ^O\\W a M-\0)) 
snr'io SNR' £ 2 /cr 2 y ' 

and therefore 

Ci nfo (SNR) J>(Wa = o(-|Q||^o(-|0)) 1 , ^ 

snr>o SNR " 'l + E.=i^W [ ' 

Noting that © & (fJT|) imply 

OO CO 

< lim y^a eL < lim V a £ = (108) 

£=1 £=L 
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we obtain by letting L tend to infinity 



Ci nfo (SNR) ^ £>(W a =o(-|O||W a = o (-|0)) 

sup - W12 ' 109 

snr>o SNR c, / cr 



Maximizing (|109|) over £ 2 yields then 



Cmfo(SNR) D(W a =o(-\Q\\W a=o (-\0)) riin s 

SUP OATO - SUP 110 

SNR>0 oNR j2 >0 



which, in view of (|97p . proves Proposition [TJ 



B Appendix to Section 15.21 



We shall prove that 



Let be defined as 



lim /(XI^;Y 6 |Xg) = 0. 

O^OQ 



a« 4 



..W A 



OibL+i-l, 



(6,z)eZ +xZ+\{(0,l)}. 



We have 

L 

/(x:^ ; Y 6 |xg) = ^7(x:^;y 6i+i |xg ) y^+;- 1 ) 

i=l 
L 

< J2( h (Y b L+i\X h ) - h(Y bL+i \X b _ 

1=1 

1 L " / / b > 

< - ^ E log (27re) a 2 + ]T a^fz+i +PL ^ a 

j=l L V V £=0 

- \ J2 E [log ((2**) fa 2 + £ a«^ 2 i+1 + £ a&J^ 

1 L " / / b co 

< \ J2 E log (2vre) a 2 + £ a&X& +1 + P L £ «{° 

- i £ E log (^e) (V 2 + f; a&J&n) j 
1 L 



i=l 



log 1 



P £ ■ 



a 2 , y-o (0 v-2 



L / oo \ 

<-^log 1 + LSNR Y, a T)' 
<=1 V £=6+1 / 



(111) 



(112) 
(113) 



(114) 



where the first inequality follows because conditioning cannot increase entropy and because, 
conditional on X'L^, YbL+i is independent of ^^i* -1 ; the next inequality follows from the 
entropy maximizing property of Gaussian random variables; the subsequent inequality fol- 
lows because J2I=-oo a b-l^lL+i — 0' * = 1> ■■■■>L\ and the last inequality follows because 

ELo a b-i X tL+i >0,i = l,...,L. 
By upper bounding 



Y a e - a ^ i = l,...,L 



(115) 



t=b+l 



£=6+1 
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we obtain 



L 



7(X:^;Y b |X^)<-log 1 + iSNR £ a, ), 



=6+1 



and (| 1 1 1 1) follows by noting that (j2Tj) implies 



lim > a,- = 0. 



6 — >oo 



=6+1 



C Proof of Lemma [5] 



We shall show that for any e > 



and 



lim Pr 

n — >oo 



lim Pr 

n — >oo 



1 



[n/L] 
1 



Y|| 2 - (cr 2 +P + aW P) 
Z|| 2 -(a 2 +a( L ) P) 



> e = 



[n/L] 

Lemma [5] follows then by the union of events bound. 
In order to prove (| 1 1 T|) & (|118[) . we first note that 

' E[||Y|| 2 ] =a 2 + ? P 



> e =0. 



1 



E[||Z|| 2 ] = ( r 2 



Ln/Lj 



\n/L\-l k 
n/L ^ ^ 

L ' J fc=l £=1 
Ln/LJ-1 fc 



fe=i ^=i 



and therefore, by Cesaro's mean [TOl Thm. 4.2.3], 



rwoo |/l/-^J 



n-+oo |/l/-^J 



E[||Y|| 2 ] = a 2 +P + aW P 



E[||Z| 



2 1 =a 2 + a 



where a'- 1 ') was defined in (l89t as 



a(L) = J2 aeL - 



Thus, for any e > and < e < e, there exists an no such that for all n > uq 

< £ 



I -^E[||Y|| 2 ]-(a 2 +P + a W P) 
^-E[||Z|| 2 ]-(a 2 +o^) P) 



< e 



and it follows from the triangle inequality that 
1 



[n/L] 
1 



\Yf -(a 2 + ? + a^ P) 



||Z|| 2 -(a z + a^ P) 



[n/L] 
From this we obtain 
1 



< 



< 



' "Y|| 2 -^^E[||Y| 



[n/L] 



[n/L] 



[n/L] 



Pr 



[n/L] 



Yf - (o- 2 + PW L) P) 



> e < Pr 



Ln/Lj 



' IIYII 2 



E[||Z|| 2 ] 



E[||Y| 



< 
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and 



Pr 



Ln/Lj 



\Zf-(a 2 + a^ P) 



■ < ) < Pr 

Var 

< 



Ln/Lj 



Ln/Lj I 



Ln/Lj 



e nizr 



> e - e 



(128) 



with Var(^4) = E [(^4 — E[v4J) 2 ] denoting the variance of A. Here the last inequalities in (|127[) & 
(rT2"gj) follow from Chebyshev's inequality [H Sec. 5.4]. 
It remains to show that 



lim Var I 

n— >oc \ yn/L\ 



lim Var 

71 — >00 



Ln/Lj 



IZI 



0. 



(129) 



We shall prove (|129p for Y. The proof for Z follows along the same lines. We begin by writing 
Var( I ^ IT ||Y|| 2 ) as 

1 



Var 



Ln/Lj 



Yll 2 



Ln/Lj -1 



k=0 



\n/L\-l 



(KLj) z 



E M^L+l) 



2 hAJ-i 

——3 E Cov(Y fe 2 i+ i^x+i) 
{\n/L\) fe=1J=0 
fe>j 



(130) 



where Cov(A, £?) = E[(^4 — E[t4])(£? — E [£?])] denotes the covariance between A and B. We shall 
evaluate both terms on the RHS of (|130p separately. For the sake of clarity, we shall omit the 
details of the derivations and show only the main steps. Unless otherwise stated these steps can 
be derived in a straightforward way using that 

i) {XkL+i 7 k £ Z(|} is a sequence of IID, zero-mean, variance-P Gaussian random variables 
whose fourth moments are given by 3P, while all odd moments are zero; 

ii) X k = if k mod L ^ 1; 

hi) {Uk} (and hence also {UkL+i , k € Zq~}) is a zero-mean, unit- variance, stationary & 
weakly-mixing random process; 

iv) and that {X^} and {Uk} are independent of each other. 

For the first sum on the RHS of (|130p it suffices to show that Var(YkL+i) < oo, k G Zq~. 
Indeed, this sum contains only L n /LJ summands and hence, when divided by (\n/L\) 2 , this sum 
vanishes as n tends to infinity, given that Var(Y/ £ L+i) < oo, k £ Zj. We have 

V a r(Y k \ +1 ) = E[Y k \ +1 ] - (E[Y k \ +1 ]) 2 



< E 
= E 



YkL + l] 

(XkL+i + 8(Xi L ) ■ UkL+i 



3P 2 + 6P ( cr 2 + PE 



OtlL 



(=1 



a 4 + 2 CT 2 PE^L + 2P 2 E^L + P 2 E 



akL 



< 3P 2 + 6P {a 2 + Pa (L) 



2 \ 

a 4 + 2a 2 P«W + 2P 2 E <4l + P 2 E [Ut L+1 ] (131) 
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where the second inequality follows by upper bounding J^« =1 otiL < oS L \ Note that Q84p implies 
that a (i) and Y,T= 

_ x a 2 L are bounded. It follows therefore by noting that UkL+i has a finite fourth 
moment that (for a finite P) 

Var(r feL+1 ) < oo. (132) 

In order to show that the second term on the RHS of (|130p vanishes as n tends to infinity, 
we shall evaluate 

Cov(Y fci+1 ,^- L+1 ) = E[Y 2 L+1 Y 2 L+1 ] - E[lf i+1 ] E[Y 2 L+1 ] 
for k G Z+, j S Z„, k > j. We have 

E[n 2 L+i^i + i] = E {X kL+1 + • U kL+1 ) 2 (x jL+1 + 6(X{ L ) ■ U jL +i) 

P f^ + P^^iJ +2P a a (fc _ J )£ 

^ 2 + P E a "J U 2 + P|>^) E[C/, 2 L+ i^iL+i] 

j 

2p2 zJ a « a («+fe-i)i E[£/f L+1 £/ 2 L+1 ] ■ 



P 2 + P a 2 + P^a £L 



Evaluating 



E[^ 2 L +1 ]E[r/ i+1 ] =P 2 + P U 2 + P$>" +P U 2 + PE 



(133) 



+ ( <j2 + p E a£L J W + p E apL J ( 134 ) 

we obtain from (fTMj) & ([T33"|> 

j 

Cov(F feL+ i, Yjl+i) = 2V 2 a {k _ j)L + 2P 2 ^2aeLa {e+k ^j)L^[UkL+iUj L+1 ] 

i=\ 

+ [ a2 + ? J2 a ^ ^ 2 + p E^'^ (E[C/ fe 2 L+i^L+i] -!)• (135) 
Summing over k and j and diving by (\n/L\) 2 yields 

o [n/Al-l 

X; c v(y^ L+1 ,^ 2 L+1 ) 



(Ln/Lj)s 



fc=l,j=0 



Ln/iJ-l 



(Ln/Lj) 2 



X 2P 2 « (fe _ j)i+ 2P 2 X 



k>j 



o 2 + a 2 + P X (E[^ + i^ + i] - 1) 



'=i 



Ln/iJ-2 [n/ij-l-j 



(K£J) 2 



X E 2P 2 a, L + 2P 2 X«^+,)LE[[/ 2 i+1 f/ 2 ] 



3=0 i/=l 



e=i 

/ j+v \ 

a 2 + Pj2 aiL 

k e=i / 



a 2 + ?J2a ilL ) (E[U 2 L+1 U 2 }-1) 



2G 



2 [n/Lj-2 [n/Lj-l-j 

= j^m? E 2p2 «^ 

VL ' J; 3=0 v=l 

2 Ln/iJ-2 Ln/iJ-l-J 3 

+ JUJZ\Y E E 2P 2 ^^ L a ( ^ )L E[[/2 i+1 [/ 1 2 ] 

Ln/iJ-2 Ln/iJ-l-j / j+i/ \ / j \ 

+ ([^1)2 E E ^ 2 +pE^J(- 2 + p E^J( e [^ + i^]-i) ! 

(136) 

where the second equality follows by substituting v = k — j and from the stationarity of {Uk}- 
The first two terms on the RHS of (|136p can be upper bounded using 

at < q 1 , < q < 1, £ > 4. 

Indeed, noting that L > £q, we have 

[n/ZJ-l-j Ln/^J-l-J L«/£J 

E < E q vL < E e" £ ( 137 ) 



and 



[n/H-l-j j [n/L\-l-j j ^ 

^2 «fi«(«+,)i < E E (^ 2L ) e" L 

i/=i t=i v=\ e=i 

Y n / L \ oo 

< EE0> 2L )V L 

v=l 1=1 



2L l n / L i 



1 - g 2i 



Consequently with (|13T[) we can upper bound the first term on the RHS of (|136[) as 

Ln/iJ-2 Ln/LJ-l-j 2 |n/£J-2 L»/£J 

- E E "'«<7i^™ E Ef 



(\n/L\) 2 ^ " (|n/LJ, 

I / r I 1 i L«/£J 

= 4P i n i 771 E e" L . ( 139 ) 

and it follows from Cesaro's mean that this upper bound tends to zero as n tends to infinity. 
Likewise with (|138j) we can upper bound the second term on the RHS of (|136| as 

2 ln/L}-2\n/L}-l-j j 

(\ n/L \)2 E E 2?2 ^Z a ^a {l+u)L E[Ul L+1 Uf\ 

4 p2 \n/L\-2ln/L}-l-j j 

- (\ n / L \)2 E E ^^lo^+^E^ 4 ] 

< 4P 2 ER/ 4 ! L " /Z/J ~ 1 1 V o vL (140) 

where the first inequality follows from the Cauchy-Schwarz inequality. As above, it follows from 
Cesaro's mean that this upper bound tends to zero as n tends to infinity. 
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It thus remains to show that the last term on the RHS of Q136P vanishes as n tends to infinity. 
We have for each j = 0, . . . , \ n/L\ — 2 



Ln/Lj-l-j 



j+u 



£ a 2 + atL a 2 + P ^ at, L (E [C/ 2 L+1 f/ 2 ] - l) 



Ln/iJ-l-j 



^ E K + p E a « r +p E Q ^ E[t/ 2 i+1 [/ 2 ] -i 



i/=i 

[n/ij-l-j 



6=1 



< 



E (- 2 



Pa 



«/=l 
Ln/iJ 



E[t^ L+ it^]-l 



< 53 (^ 2 + p« (l) ) |e[c/ 2 l+1 c/ 2 ]-i 



(141) 



where the first inequality follows by upper bounding E[f/ 2 £+1 C/ 2 ] — 1 < | E [U* L+1 E/f ] — l|; and 



the second inequality follows by upper bounding J2e=i a £L < J2e=i a tL < J2e=i a (L = ct 
The last term on the RHS of (|136|) is therefore upper bounded by 



Ln/LJ-2 Ln/Lj-l-j 



(Ln/iJl 



E E - 2 + PE^ K + P E«^ 



J=0 L/=l 



«=1 



Lri/LJ-2 L"/iJ 



< 



(L»/ij; 



£ E (Wp« (l) ) |e[c/ 2 l+1 c/ 2 ]-i 



j=0 w=l 



2 (a 4 + Pa 



(L) ^ [n/L\-l 1 



53 E[[/ 2 i+1 [/ 2 ] -1 



(142) 



It follows now from the weakly-mixing property of {Uk} that [51 Thm. 6.1] 

[n/Al. , Ln/iJ. 



lim 



i- ^ |E[[/ 2 i+1 C/ 2 ] -1 = lim -i- |E[f/^ + i^ 2 ] -E[^l+i] E[t/! 2 ] 



n^oo [n/L\ ^ 



= 



so that the last term on the RHS of (|136p vanishes as n tends to infinity. 

Thus (|142[) , (|140p . and (|139p show that (|136p vanishes as n tends to infinity which in turn 
shows, along with (|130p and (|132|) . that 



lim Var 

n — >oo 



[n/L\ 



= 0. 



Together with (|127p . this proves (|117p . The proof of (|1 18[) follows along the same lines. 
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